Allow keeping functional test datasets #3418

dbutenhof · 2023-05-11T20:15:53Z

PBENCH-1148

Allow disabling the test_delete_all test by setting the KEEP_DATASETS environment variable ... along with some squirrelly mechanism to get this set in the proper context. (Oddly, adding it to the tox.ini "safe" list wasn't enough to allow setting it outside; but we're already passing the server address as a command line parameter, so I added a second.)

Ultimately, this translates to a simple jenkins/runlocal --keep command.

PBENCH-1148 Allow disabling the `test_delete_all` test by setting the `KEEP_DATASETS` environment variable ... along with some squirrelly mechanism to get this set in the proper context. (Oddly, adding it to the `tox.ini` "safe" list wasn't enough to allow setting it outside; but we're already passing the server address as a command line parameter, so I added a second.)

webbnh

Aside from some other small issues, isn't this change going to disrupt the ability to run individual tests?

jenkins/run-server-func-tests

exec-tests

webbnh

Just got a question.

exec-tests

jenkins/run-server-func-tests

webbnh

👍

dbutenhof · 2023-05-15T17:34:42Z

Well this is interesting. Wonder if it'll recur ... ❔

Trying to pull quay.io/pbench/pbench-ci-fedora:main...
Error: initializing source docker://quay.io/pbench/pbench-ci-fedora:main: Requesting bearer token: invalid status code from registry 500 (Internal Server Error)

webbnh · 2023-05-15T17:39:08Z

this is interesting

Yeah, and, despite that, we managed to get a journalctl dump...I wonder what it dumped?....

dbutenhof · 2023-05-15T17:51:33Z

this is interesting

Yeah, and, despite that, we managed to get a journalctl dump...I wonder what it dumped?....

No, I think the failure was in the jenkins/run to fire up the functional tests; the server container and the backend pod appear to be up and running just fine.

webbnh · 2023-05-15T17:52:31Z

Mr. Jenkins managed to pull that container twice earlier in the job, but each time he pulls a fresh image. 😒

Google says that that error indicates a disconnect between Quay and its back end database.

Looking at the journalctl dump, it looks to me like the Server container started fine, so the trouble is with the container pulled to run the tests in.

dbutenhof · 2023-05-15T17:52:52Z

And now it fails the legacy test-51. This PR is haunted!

webbnh · 2023-05-15T18:00:17Z

each time he pulls a fresh image. 😒

...and, this is by design, sort of. That is, we set EXTRA_PODMAN_SWITCHES="--pull=always ..." in the environment in the CI pipeline definition. Our intention (I think) was to make sure that we didn't use a stale container...but, now that we invoke jenkins/run several times in the job, I don't think we really want to re-pull the container each time.

We should probably change that to --pull=newer.... (In addition to saving us from redundant downloads, I think it might get us past this 500 error, too.)

dbutenhof · 2023-05-15T18:05:54Z

each time he pulls a fresh image. unamused

...and, this is by design, sort of. That is, we set EXTRA_PODMAN_SWITCHES="--pull=always ..." in the environment in the CI pipeline definition. Our intention (I think) was to make sure that we didn't use a stale container...but, now that we invoke jenkins/run several times in the job, I don't think we really want to re-pull the container each time.

More efficient, to be sure; but in one sense that just makes the exposure points for whatever happened here less predictable. (I.e., whenever an executor finds a new pbench-ci-fedora image.)

We should probably change that to --pull=newer.... (In addition to saving us from redundant downloads, I think it might get us past this 500 error, too.)

Yeah, well it's particularly intriguing that it is revealed as a 500 error, which suggests that something triggered a quay "loophole".

webbnh · 2023-05-16T14:10:38Z

that just makes the exposure points for whatever happened here less predictable

Actually, I don't think it diminishes the predictability, but, I think you're right that it doesn't necessarily improve it, either -- I think it has almost the same failure point. (That is, currently, we would fail any time we couldn't pull the image; using newer only takes action if the upstream image is different from the local one, and it is supposed to ignore pull errors, although I don't know whether that includes 500 errors....)

What we should do is pull the image at the start of the run (and fail early, if possible) and then use our cached image for the rest of the run. That might not be too hard to implement. (The problem is figuring out what "at the start of the run" means, across the cases of individual developer and CI, and from the Pipeline.gy file all the way down to run-server-functional-tests, the RPM build, and runlocal....)

something triggered a quay "loophole".

I think it was a transient on their end, especially since it hasn't recurred.

dbutenhof · 2023-05-16T14:20:39Z

I think it was a transient on their end, especially since it hasn't recurred.

Ah; so the failure in a critical network service was transient. Well that's OK then. 🙉

webbnh · 2023-05-16T17:23:53Z

that's OK then

Not so much "OK" as "beyond my control" (and, I expect, beyond my ability to do very much about).

But, if we didn't depend on pulling the container on every invocation, it might help.... 🥴

dbutenhof added Server Functional Testing testing labels May 11, 2023

dbutenhof requested review from ndokos, MVarshini, webbnh, npalaska, vishalvvr, riya-17 and siddardh-ra May 11, 2023 20:15

dbutenhof self-assigned this May 11, 2023

webbnh requested changes May 12, 2023

View reviewed changes

jenkins/run-server-func-tests Outdated Show resolved Hide resolved

jenkins/run-server-func-tests Outdated Show resolved Hide resolved

jenkins/run-server-func-tests Outdated Show resolved Hide resolved

exec-tests Show resolved Hide resolved

Updates

6ca0e9f

webbnh previously approved these changes May 15, 2023

View reviewed changes

exec-tests Show resolved Hide resolved

jenkins/run-server-func-tests Outdated Show resolved Hide resolved

Remove WORKSPACE_TMP workaround

73113c8

dbutenhof dismissed webbnh’s stale review via 73113c8 May 15, 2023 15:33

webbnh approved these changes May 15, 2023

View reviewed changes

npalaska approved these changes May 16, 2023

View reviewed changes

dbutenhof merged commit 5d3ac3c into distributed-system-analysis:main May 17, 2023

dbutenhof deleted the pop branch May 17, 2023 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow keeping functional test datasets #3418

Allow keeping functional test datasets #3418

dbutenhof commented May 11, 2023 •

edited

Loading

webbnh left a comment

webbnh left a comment

webbnh left a comment

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 16, 2023

dbutenhof commented May 16, 2023

webbnh commented May 16, 2023

Allow keeping functional test datasets #3418

Allow keeping functional test datasets #3418

Conversation

dbutenhof commented May 11, 2023 • edited Loading

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 15, 2023

dbutenhof commented May 15, 2023

webbnh commented May 16, 2023

dbutenhof commented May 16, 2023

webbnh commented May 16, 2023

dbutenhof commented May 11, 2023 •

edited

Loading